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(54) Subframe-based correlation 



(57) A subframe-based correlation method for pitch 
and voicing is provided by finding the pitch track through 
a speech frame that minimizes pitch prediction residual 
energy over the frame. The method scans the range of 
possible time lags T and computes for each subframe 



within a given range of T the maximum correlation value 
and further finds the set of subframe lags to maximize 
the correlation over all of possible pitch lags. 
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EP 0 955 627 A2 

Description 

TECHNICAL FIELD OF THE INVENTION 

5 [0001 ] This invention relates to method of correlating portions of an input signal such as used for pitch estimation and 
voicing. 

BACKGROUND OF THE INVENTION 

to [0002] The problem of reliable estimation of pitch and voicing has been a critical issue in speech coding for many 
years. Pitch estimation is used, for example, in both Code-Excited Linear Predictive (CELP) coders and Mixed Excita- 
tion Linear Predictive (MELP) coders. The pitch is how fast the glottis is vibrating. The pitch period is the time period of 
the waveform and the number of these repeated variations over a time period. In the digital environment the analog sig- 
nal is sampled producing the pitch period T samples. In the case of the MELP coder we use artificial pulses to produce 

15 synthesized speech and the pitch is determined to make the speech sound right. The CELP coder also uses the esti- 
mated pitch in the coder. The CELP quantizes the difference between the periods. In the MELP coder, there is a syn- 
thetic excitation signal that you use to make synthetic speech which is a mix of pulses for the pulse part of speech and 
noise for unvoiced part of speech. The voicing analysis is how much is pulse and how much is noise. The degree of 
voicing correlation is also used to do this. We do that by breaking the signal into frequency bands and in each frequency 

20 band we use the correlation at the pitch value in the frequency band as a measure of how voiced that frequency band 
is. The pitch period is determined for all possible lags or delays where the delay is determined by the pitch back by T 
samples. In the correlation one looks for the highest correlation value. 

[0003] Correlation strength is a function of pitch lag. We search that function to find the best lag. For the lag we get a 
correlation strength which is a measure of the degree that the model fits. 
25 [0004] When we get best lag or correlation we get the pitch and we also get correlation strength at that lag which is 
used for voicing. 

[0005] For pitch we compute the correlation of the input against itself 

AM 

30 C(T)« X X n X n-T 

n-0 



[0006] In the prior art this correlation is on a whole frame basis to get the best predictable value or minimum prediction 
35 error on a frame basis. The error 

n 

40 

where the predicted value x n - gx n _ T (some delayed version 7") where g = a scale factor which is also referred to as 
pitch prediction coefficient 

45 

E = £(*n * Q x n-r) 2 
n 



so one tries to vary time delay T to find the optimum delay or lag. 

[0007] It is assumed that in the prior art g and T are constant over the whole frame. 
[0008] It is known that g and T are not constant over a whole frame. 

SUMMARY OF THE INVENTION 

55 

[0009] In accordance with one embodiment of the present invention, a subframe-based correlation method for pitch 
and voicing is provided by finding the pitch track through a speech frame that minimizes the pitch-prediction residual 
energy over the frame assuming that the optimal pitch prediction coefficient will be used for each subframe lag. 



2 



BNSDOCID: <EP_„ 0955627 A2 I > 



35 



EP 0 955 627 A2 



DESCRIPTION OF THE DRAWINGS 



[0010] 



10 



Fig. 1 is a flow chart of the basic subframe correlation method according to one embodiment of the present inven- 
tion; 

Fig. 2 is a block diagram of a multi-modal CELP coder; 

Fig. 3 is a flow diagram of a method characterizing voiced and unvoiced speech with the CELP coder of Fig. 2; 
Fig. 4 is a block diagram of a MELP coder; and 

Fig. 5 is a block diagram of an analyzer used in the MELP coder of Fig. 4. 



DESCRIPTION OF PREFERRED EMBODIMENTS OF THE PRESENT INVENTION 

[001 1 ] In accordance with one embodiment of the present invention, there is provided a method for computing corre- 
15 lation that can account for changes in pitch within a frame by using subframe-based correlation to account for variations 
over a frame. The objective is to find the pitch track through a speech frame that minimizes the pitch prediction residual 
energy over the frame, assuming that the optimal pitch prediction coefficient will be used for each subframe lag T s . For- 
mally, this error can be written as a sum over N s subframes. 



20 



Ms 



&n x nXn-T s V 



(1) 



25 where x n is the n th sample of the input signal and the sum over n includes all the samples in subframe $. Minimizing 
the pitch prediction error or residual energy is equivalent to finding the set of subframe lags { T s ] to maximize the corre- 
lation. The part after the minus term is what reduces the error or maximizes the correlation so we have for the maximum 
over the set of 



30 



40 



max: 



(2) 



We find set of [T s ] which is the maximum over the double sum. It is the maximum over the set of T s from s=1 to N s (all 
45 frame). According to the present invention, we also impose the constraint that each subframe pitch lag 7" s must be 
within a certain range or constraint A of an overall pitch value T: 



50 



upper 

T = max 

T -lower 



max 



(£. *.*«-r, ) 2 



(3) 



55 We are therefore going to search for the maximum over all of possible pitch lags T (lower to upper max). The overall T 
we are finding is the maximum value. Note that without the pitch tracking constraint the overall prediction error is mini- 
mized by finding the optimal lag for each subframe independently. This method incorporates the energy variations from 
one subframe to the next. 
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[0012] In accordance with the present invention as illustrated in Fig. 1, a subframe-based correlation method is 
achieved by a processor programmed according to the above equation (3). 

[0013] After initialization of step 101, the program scans step 102 the whole range of 7 lags times from for example 
20 to 160 samples. 
5 For 7= T min - T max (20 to 160 samples) 

The program involves a double search. Given a 7, the inner search is performed across subframe lags {T s } within (the 
constraint) A of that 7 We also want the maximum correlation value over all possible values of 7. The program in step 
103 for each 7 computes the maximum correlation value of 



for the subframe s where the search range for the subframe is 2A+1 lag values (for typical value of A=5, 1 1 lag values). 
We find the 7 S maximum value out of the 2A+1 lag values in a circular buffer 104. For example, if 7=50 the subframe 
lag 7 S varies from 45-55 so we search the 1 1 values in each subframe. When 7 goes to 51 the range of T s is 46-56. All 
but one of these values was previously used so we use a circular buffer (104) and add the new correlation value for 7 S 

20 =56 and remove the old one corresponding to 7 S =45. Find the T s in these 1 1 that gives the maximum correlation value. 
This is done for all values of 7 (step 103). The program then looks for the best 7 overall by summing the correlation 
values of subframe sets T s , comparing the sets of subframes and storing the sets that correspond to the maximum 
value and storing that 7 and sets of T s that correspond to the maximum value. This can be done by a running sum over 
the subframe for each tag 7 from 7 m/n -> T max (step 1 05) and comparing the current sum with previous best running sum 

25 of subframes for other lags 7 (step 107). The greatest value represents the best correlation value and is stored (step 
110). This can be done by the program comparing the sum of the sets of frames with each previous set and selecting 
the greater. The program ends after reaching the maximum lag T max (step 109) and the best is stored. A c-code exam- 
ple to search for best pitch path follows where pcorr is the running sum, v_inner is a function product of two vectors 



w 




15 



30 




temp*temp is squaring, vjnagsq is 



35 




40 



and maxloc is the location of the maximum in the circular buffer: 



45 



50 
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Search for best pitch path */ 
for (i = lower; i <- upper; i++) { 

pcorr = 0.0; 

/* Search pitch range over subframes */ 

c__begin = sig_in; 

for (j = 0; j < num_sub; j++) { 

/* Add new correlation to circular buffer 

/* use backward correlations */ 
c_lag = c_begin- i- range; 

if ( i + range > upper ) 
/* don't go outside pitch range */ 
corr [ j ] [nextk[ j ] ] = -FLT_MAX; 

else { 

temp = v_inner (c_ begin, c_lag, sub_len [ j ] ) 
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if (temp > 0.0) 

corr [ j ] [nextk [ j ] ] = 
temp* temp/ v_magsq (c_lag, sub_len [j ] ) ; 

else 

corr [j ] [nextk [j ] ] = 0.0; 

} 

io /* Find maximum of circular buffer */ 

maxloc = 0 ; 

temp = corr[j] [maxloc] ; 
for (k = 1; k < range2 ; k++) { 
if (corr [j ] [k] > temp) { 
16 temp = corr[j][k]; 

maxloc = k; 

} 

} 

20 

/* Save best subframe pitch lag */ 
if (maxloc <= nextk[j]) 

sub_p[j] = i + range + maxloc - nextk [j]; 
else 

25 sub_p[j] = i + range + maxloc - range2 - nextk [ j ] ; 

/* Update correlations with pitch doubling check */ 

pdbl = 1.0 - (sub_p[j]* (1.0- DOUBLE. VAL)/(upper)); 

30 

I pcorr + = temp* pdbl* pdbl; 

/* Increment circular buffer pointer and c_begin */ 
nextk [j ] + + ; 
35 if (nextk[j] >= range2) 

nextk [ j ] = 0 ; 

c_begin += sub_len [ j ] ; 



/* check for new maxima with pitch doubling */ 
if (pcorr > maxcorr) { 

45 /* New max: update correlation and pitch path */ 

maxcorr = pcorr; 

v__equ_int (ipitch, sub_p, num_sub) ; 

} 

} 



For voicing we need to calculate the normalized correlation coefficient (correlation strength) p for the best pitch path 
found above. 

55 [0014] For voicing we need to determine what is the normalized correlation coefficient. In this case, we need a value 
between -1 and +1 . We use this as voicing strength. For this case we use the path of T s determined above and use the 
set of values T s in the equation to compute the normalized correlation 
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' n x n-T < 



s = 1 



(4) 



[001 5] We go back and recompute for the subf rame T s . We know we evaluate p only for the wining path T $ . We could 
10 either save these when computing subframe sets T s and then compute using the above formula 4 or recompute. See 
step 111 in Fig. 1. 

[0016] An example of c-code for calculating normalized correlation for pitch path follows: 



/* Calculate normalized correlation for pitch path */ 
pcorr = 0.0; 
pnorm = 0.0; 
c_begin = sig_in; 
for (j = 0; j < num__sub; j + + ) { 
c_lag = c_begin-ipitch [ j ] ; 

temp = v__inner (c_begin, c_lag, sub_len[ j ] ) ; 
if (temp > 0.0) 

temp = temp* temp/ v_magsq (c_lag, sub_len [ j ] ) ; 

else 

temp = 0.0; 
pcorr + = temp; 

pnorm + = v_magsq (c_begin , sub_len [ j ] ) ; 
c_begin += sub_len[j]; 

30 } 

pcorr = sqrt (pcorr/ (pnorm+0 . 01) ) ; 



/* Return overall correlation strength */ 
return (pcorr ) ; 



} 

/* 



[001 7] The present invention includes extensions to the basic invention, including modifications to deal with pitch dou- 
bling, forward/backward prediction and fractional pitch. 

[0018] Pitch doubling is a well-known problem where a pitch estimation returns a pitch value twice as large as the true 
pitch. This is caused by an inherent ambiguity in the correlation function that any signal that is periodic with period T 
45 has a correlation of 1 not just at lag T but also at any integer multiple of T so there is no unique maximum of the corre- 
lation function. To address this problem, we introduce a weighting function w(T) that penalizes longer pitch lags T. 
[0019] In accordance with a preferred embodiment, the weighting is 

so W (7 s) = (i-r sT ^-) 2 

' max 

with a typical value for D of 0.1. The value D determines how strong the weighting is. The larger the D the larger the 
penalty. The best value is determined experimentally. This is done on a subframe basis. This weighting is represented 
55 by substep block 103a within 103. The overall value of the equation substep block 103b of block 103 is weighted by mul- 
tiplying by 
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10 



This pitch doubling weighting is found in the bracketed portion of the code provided above and is done on the subframe 
basis in the inner loop. The typical formulation of pitch prediction uses forward prediction where the prediction is of the 
current samples based on previous samples. This is an appropriate model for predictive encoding, but for pitch estima- 
tion it introduces an asymmetry to the importance of input samples used for the current frame, where the values at the 
start of the frame contribute more to the pitch estimation than samples at the end of the frame. This problem is 
addressed by combining both forward and backward prediction, where the backward prediction refers to prediction of 
the current samples from future ones. For the first half of the frame, we predict current samples from future values 
(backward prediction) while for the second half of the frame we predict current samples from past samples (forward pre- 
diction). This extends the total prediction error to the following: 



15 



20 



2 

s=1 



2 ( I 'n x n x n + T ) 

Zv ^~7 



2 (^n x n x n-T s ) 

n 



21 



[0020] Finding the subframe lag using equation 5 would be 



(5) 



25 



max 

in 



30 Placing the constraint of A the computing in step 1 03b would be for the overall 



35 



1L 

upper 2 

max 

lower 



y max 



(Ln X * X «+T t )' 



-IL 

2 



max 



(6) 



K n-T % 



This operation is illustrated by the following program: 

40 

/* Search for best pitch path */ 
for (i = lower; i <= upper; i++) { 

45 pcorr=0.0; 

/* Search pitch range over subframes */ 
for (j = 0; j < num^sub; j++) { 

50 

/* Add new correlation to circular buffer */ 
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c_begin = Stsig_in [ j *sub_len] ; 

/* check forward or backward correlations */ 
if (j < num_sub2) 

c__lag = c_begin+i+range; 
else 

c_lag = c_begin-i -range ; 

if (i+range > upper) 
/* don't go outside pitch range */ 
corr [ j ] [ next k [ j ] ] = - F LT_MAX ; 

else { 

temp = v_inner (c_begin, c_lag, sub_len) ; 

if (temp > 0.0) 

corr [ j ] [ nextk [ j ] ] = 
temp* temp/ v_magsq (c_lag , sub_len) ; 

else 

corr[ j] [nextk [ j] ] = 0.0; 

} 

/* Find maximum of circular buffer */ 
maxloc = 0; 

temp = corr[j] [maxloc] ; 
for (k - 1; k < range2; k++) { 
if (corr[j][k] > temp) { 
temp = corr[j][k]; 
maxloc = k; 

} 

} 

/* Save best subframe pitch lag */ 
if (maxloc <= nextk [j]) 

sub_p[j] = i + range + maxloc - nextk[j]; 
else 

sub_jp[j] = i + range + maxloc - range2 - nextk [ 
/* Update correlations with pitch doubling check 

* Update correlations with pitch doubling check * / 
pdbl = 1.0 - (sub_p!j]*(1.0-DOUBLE_VAL)/(upper)); 
pcorr + = temp* pdbl * pdbl; 

/* Increment circular buffer pointer */ 
nextk [j ] ++ ; 

if ( nextk [j] >= range2 ) 
nextk [j] = 0; 



* check for new maxima with pitch doubling */ 
f (pcorr > maxcorr) { 
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/* New max: update correlation and pitch path */ 
maxcorr = pcorr; 

v_equ_int (ipitch, sub_p , num_sub) ; 

} 

} 



10 

[0021 ] Another problem with traditional correlation measures is that they can only be computed for pitch lags that con- 
sist of an integer number of samples. However, for some signals this is not sufficient resolution, and a fractional value 
for the pitch is desired. For example, if the pitch is between 40 and 41 , we need to find the fraction of a sampling period 
(q). We have previously shown that a linear interpolation formula can provide this correlation for a frame-based case. 
is To incorporate this into the subframe pitch estimator, one can use the fractional pitch interpolation formula for the sub- 
frame estimate p s (7" s ) instead of the integer pitch shown in Equation 3. This fractional pitch estimation can be derived 
from the equation in column 8 in U.S. Patent No. 5.699,477 incorporated herein by reference where p is T s and c is the 
inner product of the two vectors 



20 



25 



c(t v t 2 )=Z n x n _ ti x n _ h . 

For example, 

c(0, T+1) = X n x n x n . (r+1) . 
The fraction q of a sampling period to add to T s equals: 

c(o,r s+ i)c(r s ,r s )-c(o,r s )c(r S) r s+ i) 



30 



c(o,r s+ i)[c(r s ,r s )-c(r sl 7- s+ i)] + c(o, r s )[c(T s+ i,r s+ i)-c(r, r+i)] 



(7) 



[0022] The normalized correlation uses the second formula on column 8 for each of the subframes we are using. For 
this equation p is T s and c is the inner product so: 

p( 7 S +Q) = = (8) 

40 Equation 4 gives the normalized correlation for whole integers. This becomes 



45 



where p s = T n x 2 n and p s (T s ) = n n s (9) 



A/ s r S -n-n i-sv-s' t ^ 2 — 

aJ^ n x n^ n x n-T 



s=1 



so [0023] The values for p $ (T s + q) in equation 8 are substituted for p s (T s )in the equation 9 above to get the normalized 
correlation at the fractional pitch period. 

[0024] An example of code for computing normalized correlation strengths using fractional pitch follows where temp 
is p s {T s + q), p s is v_magsq(c_begin, length), pcorr is p(T) and co_T is c(0.7"): 
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/* 

Subroutine sub_pcorr : subframe pitch correlations 

*/ 

float sub_pcorr ( float sig__in [ ] , int pitch [],int nuin_sub, int length) 
{ 

int num_sub2 = num_sub/2; 
int j , forwards- 
float *c_begin, *c_lag; 
float temp,pcorr ; 

/* Calculate normalized correlation for pitch path */ 
pcorr = 0.0; 

for (j = 0; j < num_sub; j + + ) { 
c_begin = &sig_in [ j *length] ; 

/* check forward or backward correlations */ 
if (j < num_sub2 ) 
forward = 1 ; 

else 

forward = 0 ; 
if (forward) 

c_lag = c__begin+pitch( j ] ; 

else 

c_lag = c__begin-pitch [ j ] ; 

/* fractional pitch */ 

f rac_pch2 (c_begin, ^temp, pi tch [ j ] , PITCHMIN, PITCHMAX, length, for 
ward) ; 

if (temp > 0.0) 



40 
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temp = temp*temp*v_magsq(c_begin, length) ; 

else 

temp = 0.0; 
pcorr + = temp; 

} 

pcorr = sqrt (pcorr/ (v_magsq (&sig_in [ 0 ] , num_sub* length ) +0 . 01 )) ; 
return (pcorr ) ; 

} 

/* 
*/ 

/* frac_pch2.c: Determine fractional pitch. 

*/ 

/* 

*/ 

#define MAXFRAC 2.0 
#define MINFRAC -1.0 

float f rac_pch2 < float sig_in(], float *pcorr, int ipitch, int 

pm in, int pmax , 

int length, int forward) 

{ 



float c0_0 , c0__T, c0_Tl , cT_T, cT_Tl , cTl_Tl , c0__Tml ; 
float fracfracl; 
30 float f pitch, denom; 

/* Estimate needed crosscorrelat ions */ 
if (ipitch >= pmax) 

ipitch = pmax - 1 ; 
if (forward) { 
c0_T = v_inner (&sig_in[0J ,&sig_in[ipitch] , length) ; 
c0_Tl = v.inner (&sig_ in[0] , &sig__in [ ipitch+1 ] , length) ; 
c0_Tml = v_inner (&sig_in [0] , &sig_in [ ipitch- 1] , length) ; 

} 

else { 

c0„T = v_inner (&sig_in [ 0 ] , &sig_in [ -ipitch] , length) ; 
c0_Tl = v_inner (&sig_in[0] , &sig__in [ -ipitch-1] , length) ; 
c0_Tml = v_inner (&sig_in[0] , &sig„in [ -ipitch+1] , length) 

} 



if <c0_Tml > cO__Tl) { 
/* fractional component should be less than 1, so decrement 
pitch */ 

c0_Tl = c0__T; 
c0_T = c0_Tml; 
ipitch-- ; 

> 

c0_0 = v_inner (&sig_in[0] f &sig,in[0] , length) ; 
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if (forward) { 
cT_T = v__inner (&sig_in [ ipitch] , &sig„in [ ipitch] , length) ; 
cT__Tl = v_inner (&sig_ in [ ipitch] , &sig_in [ ipitch+1] , length) ; 
cTl_Tl = v_inner ( &sig_in [ ipitch+1 ] , &sig_in [ ipitch+1] , length) ; 

} 

else { 

cT — T = v__inner (&sig_in [ - ipitch] , &sig_ in [ - ipitch] , length) ; 
cT__Tl = v_inner (&sig_in [-ipitch] , &sig__in [ - ipitch-1 ] , length) ; 
cTl__Tl = v_inner ( &sig_in [ -ipitch- 1 ] , &sig_in [- ipitch - 

1] , length) ; 
} 

/* Find fractional component of pitch within integer range */ 
denom = cO_Tl*(cT_T - cT_Tl) + cO_T*(cTl_Tl - cT_Tl) ; 
if ( fabs (denom) > 0.01) 

frac = (cO_Tl*cT„T - cO_T*cT_Tl ) / denom; 
else 

frac = 0.5; 
if (frac > MAXFRAC) 

frac = MAXFRAC ; 
if (frac < MINFRAC) 
frac = MINFRAC ; 

/* Make sure pitch is still within range */ 
fpitch = ipitch + frac; 
if (fpitch > pmax) 

fpitch = pmax; 
if (fpitch < pmin) 

fpitch = pmin; 
frac = fpitch - ipitch; 

/* Calculate interpolated correlation strength */ 
fracl = 1.0 - frac; 
denom = c0_0* ( f racl*f racl*cT_T + 2*f rac* fracl *cT_Tl + 
f rac*f rac*cTl_Tl) ; 

denom = sqrt(denom) ; 
if (fabs (denom) > 0.01) 
40 *pcorr = (fracl*c0_T + frac*c0_Tl) / denom; 

else 

*pcorr = 0.0; 

/* Return full floating point pitch value */ 
45 return (fpitch) ; 

} 

#undef MAXFRAC 
#undef MINFRAC 

50 



30 



35 



[0025] The subframe-based estimate herein has application to the multi-modal CELP coder as described in applica- 
tion of Paksoy and McCree, Serial No. 08/999 ,433 -filed 12/29/97 (TI-23721). This application is incorporated herein by 
55 reference and a copy provided in Appendix A. A block diagram of this CELP coder is illustrated in Fig. 2. This subframe- 
based pitch estimate can be used as an estimate for initial (open-loop) pitch estimation gain for a subframe in place of 
a frame. This is step 104 in Fig. 2 of the cited application and is presented as Fig. 3 herein. Fig. 3 illustrates a flow chart 
of a method of characterizing voiced and unvoiced speech in the CELP coder. In accordance with the present invention, 
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one searches over the pitch range for the pitch lag T with maximum correlation as given above. The weighting function 
described above is used to penalize pitch doubles. For this example, only forward prediction and integer pitch estimates 
are used. This open loop pitch estimate constrains the pitch range for the later closed loop procedure. In addition, the 
normalized correlation p can be incorporated into a multi-modal CELP coder as a measure of voicing. 
[0026] The Mixed Excitation Linear Predictive (MELP) coder was recently adopted as the new U.S. Federal Standard 
at 2.4kb/s. Although 2.4kb/s is considered a low bit rate there is a desire to go to an even lower rate. Fig. 4 illustrates a 
MELP synthesizer with mixed pulse and noise excitation, periodic pulses, adaptive spectral enhancement, and a pulse 
dispersion filter. This subframe based method is used for both pitch and voicing estimation. An MELP coder is 
described in applicants' U.S. Patent No. 5,699,477 incorporated herein by reference. The pitch estimation is used for 
the pitch extractor 604 of the speech analyzer of Fig. 6 in the above-cited MELP patent. This is illustrated herein as Fig. 
5. For pitch estimation the value of Tis varied over the entire pitch range and the pitch value T is found for the maximum 
values (maximum set of subframes T s ). We also find the highest normalized correlation p of the low pass filtered signal, 
with the additional pitch doubling logic by the weighting function described above to penalize pitch doubles. The for- 
ward/backward prediction is used to maintain a centered window, but only for integer pitch lags. 
[0027] For bandpass voicing analysis, we apply the subframe correlation method to estimate the correlation strength 
at the pitch lag for each frequency band of the input speech. The voiced/unvoiced mix determined herein with p is used 
for mix 608 of Fig. 6 of the cited application and Fig. 5 of the present application. One examines all of the frequency 
bands and computes a p for each. In this case, applicants use the forward/backward method with fractional pitch inter- 
polation but no weighting function is used since applicants use the estimated integer pitch lags from the pitch search 
rather than performing a search. 

[0028] Experimentally, the subframe-based pitch and voicing performs better than the frame-based approach of the 
Federal Standard, particularly for speech transition and regions of erratic pitch. 

Claims 

1. A subframe-based correlation method comprising the steps of : 

varying lag times T over all pitch range in a speech frame; 

determining pitch lags for each subframe within said overall range that maximize the correlation value accord- 
ing to 

2 

^ n( x n x n-T s ) 

provided the pitch lags across the subframe are within a given constrained range, where T s is the subframe 
lag, x n is the n th sample of the input signal and the includes all samples in subframes. 

2. The method of Claim 1 wherein said constrained range is 7~-A to T+A where T is the lag time. 

3. The method of Claim 2 where A=5. 

4. The method of Claim 1 wherein the determining step further includes determining maximum correlation values of 
subframes T s for each value T, sum sets of T s over ail pitch range and determine which set of T s provides the max- 
imum correlation value over the range of T. 

5. The method of Claim 1 wherein for each subframe performing pitch there is a weighting function to penalize pitch 
doubles. 

6. The method of Claim 5 wherein the weighting function is 

w(T s ) = {1-T s j¥-) 2 . 

' max 

where D is a value between 0 and 1 depending on the weight penalty. 
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7. The method of Claim 6 where D is 0.1 . 

8. The method of Claim 4 wherein pitch prediction comprises of predictions from future values and past values. 

5 9. The method of Claim 4 wherein pitch prediction comprises for the first half of a frame predicting current samples 
from future values and for the second half of the frame predicting current samples from past samples. 

10. A subframe-based correlation method comprising the steps of : 

10 varying lag times 7 over all pitch range in a speech frame; 

determining pitch lags for each subframe within said overall range that maximize the correlation value accord- 
ing to 

15 T n( x n x n-T s ) 2 

15 2 xw(T s ) 



provided the pitch lags across the subframe are within a given constrained range, where T s is the subframe 
20 lag, x n is the n th sample of the input signal w(T s ) is a weighting function to penalize pitch doubles and the £ n 

includes all samples in subframes. 

11. The method of Claim 10 wherein said constrained range is 7"-A to 7+A where 7 is the lag time. 

25 1 2. The method of Claim 1 1 where A=5- 

13. The method of Claim 10 wherein the determining step further includes determining maximum correlation values of 
subframes T s for each value 7, sum sets of T s over all pitch range and determine which set of T s provides the max- 
imum correlation value over the range of 7. 
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14. The method of Claim 10 wherein the weighting function is 



where D is between 0 and 1 depending on the determined weight penalty. 
1 5. A method of determining normalized correlation coefficient comprising the steps of: 

providing a set of subframe lags T s and computing the normalized correlation for that set of T s according to 



P(T) 



s=1 £ n x n~T 



s=1 



where N s is the number of samples in a frame and x n is the n th sample. 

16. A subframe-based correlation method comprising the steps of : 

55 varying lag times T over all pitch range in a speech frame; 

determining pitch lags for each subframe within said overall range that maximize the correlation value accord- 
ing to 



15 
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5 = 1 



•v. . 



xw(r 5 ) 



provided the pitch lags across the subframe are within a given constrained range, where T s is the subframe 
lag, x n is the n th sample of the input signal, N s is samples in a frame, w( T s ) is a weighting function for doubles 
and the Z n includes all samples in subframes. 



15 



17. The method of Claim 16 wherein said constrained range is 7- A to T+A where T is the lag time. 

18. The method of Claim 17 where A=5. 



19. The method of Claim 17 wherein the determining step further includes determining maximum correlation values of 
subframes T s for each value 7", sum sets of T s over ail pitch range and determine which set of T s provides the max- 

20 imum correlation value over the range of T. 

20. A voice coder comprising: 



25 



30 



an encoder for voice input signals, said encoder including 
a pitch estimator for determining pitch of said input signals; 

a synthesizer coupled to said encoder and responsive to said input signals for providing synthesized voice out- 
put signals, said synthesizer coupled to said pitch estimator for providing synthesized output based for said 
determined pitch of said input signals; 
said pitch estimator determining pitch according to: 



upper 

T = max 

T— lower 



35 



max 



■> 

X„ X n~T 



40 



where T s is the subframe lag, x n is the n th sample of the input signal, Z n , includes all samples in the subframe, 
T is determining maximum correlation values of subframes for each value T, N s is the number of samples in a 
frame and A is the constrained range of the subframe. 



21 . A voice coder comprising: 



45 



an encoder for voice input signals, said encoder including means for determining sets of subframe lags T s over 
a pitch range; and 

means for determining a normalized correlation coefficient p( 7) for a pitch path in each frequency band where 
p(T) is determined by 



50 



55 



P(T) 



s-1 ^n x n-T, 



s=1 



where N s is the number of samples in a frame, and x n is the n th sample. 
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22. The voice coder of Claim 21 including means responsive to said normalized correlation coefficient for controlling 
for voicing decision. 

23. The voice coder of Claim 21 including means responsive to said normalized correlation coefficient for controlling 
the modes in a multi-modal coder. 

24. A voice coder comprising: 

an encoder for voice input signals said encoder including 
a pitch estimator for determining pitch of said input signals; 

a synthesizer coupled to said encoder and responsive to said input signals for providing synthesized voice out- 
put signals, said synthesizer coupled to said pitch estimator for providing synthesized output based for said 
determined pitch of said input signals; 
said pitch estimator determining pitch according to: 



T = 



^n x n x n-Ty 



*n* 2 n-T 



where T s is the subframe lag, x n is the n th sample of the input signal and I n includes all samples in subframes. 

25. A method of determining normalized correlation coefficient at fractional pitch period comprising the steps of: 

providing a set of subframe lags T s \ 
finding a fraction q by 

c(0, 7 5+ 1)c( 7 SI 7>c(0. T s )c( T s , 7>1) 



c(o,T s+ i)[c(7 S) r 5 )-c(r s ,r s +i)]+c(o,T 5 )[c(r s+ i,T s+ i)-c(T s ,r s +i)] 

where c is the inner product of two vectors and the normalized correlation for subframe is determined by; 

(1-q)c(0,7 s ) +Q c(0,r s+1 ) 



p s (T s +Q) = 



7c(o,o)[(i-Q) 2 (r s ,r s ) + 2< 7 (i-Q)c(r s ,r s+1 )+Q 2 c(r s+1 ,r s+1 )]' 



40 



45 



and substituting p s (T s + q) for p s in 



P(T) 



s=1 



s = 1 



where p s = l. n x 
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(54) Subframe-based correlation 

(57) A subframe-based correlation method for pitch 
and voicing is provided by finding the pitch track through 
a speech frame that minimizes pitch prediction residual 
energy over the frame. The method scans the range of 
possible time lags T and computes for each subframe 



within a given range of 7~the maximum correlation value 
and further finds the set of subframe lags to maximize 
the correlation over all of possible pitch lags. 
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